Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm

نویسندگان

  • Bjarke Felbo
  • Alan Mislove
  • Anders Søgaard
  • Iyad Rahwan
  • Sune Lehmann
چکیده

NLP tasks are often limited by scarcity of manually annotated data. In social media sentiment analysis and related tasks, researchers have therefore used binarized emoticons and specific hashtags as forms of distant supervision. Our paper shows that by extending the distant supervision to a more diverse set of noisy labels, the models can learn richer representations. Through emoji prediction on a dataset of 1246 million tweets containing one of 64 common emojis we obtain state-of-theart performance on 8 benchmark datasets within emotion, sentiment and sarcasm detection using a single pretrained model. Our analyses confirm that the diversity of our emotional labels yield a performance improvement over previous distant supervision approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotion Analysis of Twitter Data That Use Emoticons and Emoji Ideograms

Twitter is an online social networking service on which users worldwide publish their opinions on a variety of topics, discuss current issues, complain, and express many kinds of emotions. Therefore, Twitter is a rich source of data for opinion mining, sentiment and emotion analysis. This paper focuses on this issue by analysing symbols called emotion tokens, including emotion symbols (e.g. emo...

متن کامل

Approaches for Computational Sarcasm Detection: A Survey

Sentiment Analysis deals not only with the positive and negative sentiment detection in the text but it also considers the prevalence and challenges of sarcasm in sentiment-bearing text. Automatic Sarcasm detection deals with the detection of sarcasm in text. In the recent years, work in sarcasm detection gains popularity and has wide applicability in sentiment analysis. This paper complies the...

متن کامل

emoji2vec: Learning Emoji Representations from their Description

Many current natural language processing applications for social media rely on representation learning and utilize pre-trained word embeddings. There currently exist several publicly-available, pre-trained sets of word embeddings, but they contain few or no emoji representations even as emoji usage in social media has increased. In this paper we release emoji2vec, pre-trained embeddings for all...

متن کامل

Joint Emoji Classification and Embedding Learning

Under conversation scenarios, emoji is widely used to express humans’ feelings, which greatly enriches the representation of plain text. Plentiful utterances with emoji are produced by humans manually in social media platforms every day, which make emoji great influence on the human life. For the academic community, researchers are always with the help of utterances including emoji as annotated...

متن کامل

CrystalNest at SemEval-2017 Task 4: Using Sarcasm Detection for Enhancing Sentiment Classification and Quantification

This paper describes a system developed for a shared sentiment analysis task and its subtasks organized by SemEval-2017. A key feature of our system is the embedded ability to detect sarcasm in order to enhance the performance of sentiment classification. We first constructed an affect-cognition-sociolinguistics sarcasm features model and trained a SVM-based classifier for detecting sarcastic e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017